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\D ■ Abstract 
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O. 

Neural coding is a field of study that concerns how sensory information is represented 

en ' 

in the brain by networks of neurons. The link between external stimulus and neural re- 

sponse can be studied from two parallel points of view. The first, neural encoding refers 
to the mapping from stimulus to response, and primarily focuses on understanding how 
neurons respond to a wide variety of stimuli, and on constructing models that accurately 
describe the stimulus-response relationship. Neural decoding, on the other hand, refers 
to the reverse mapping, from response to stimulus, where the challenge is to reconstruct 
a stimulus from the spikes it evokes. Since neuronal response is stochastic, a one-to-one 
mapping of stimuli into neural responses does not exist, causing a mismatch between 
the two viewpoints of neural coding. Here, we use these two perspectives to investigate 
the question of what rate coding is, in the simple setting of a single stationary stimu- 
lus parameter and a single stationary spike train represented by a renewal process. We 
show that when rate codes are defined in terms of encoding, i.e., the stimulus parame- 
ter is mapped onto the mean firing rate, the rate decoder given by spike counts or the 



sample mean, does not always efficiently decode the rate codes, but can improve effi- 
ciency in reading certain rate codes, when correlations within a spike train are taken 
into account. 



1 Introduction 

Sensory and behavioral states are represented by neuronal responses. Determining 
which code is used by neurons i s important in order to understand how the brain car- 



ries out information processing (IDayan & Abbott , 



2001; 



Riekeet al. . 



19971) . Coding 



schemes used by neurons can be divided approximately into two categories. In rate 
coding, the stimulus is mapped onto the firing rate, defined as the average number of 
spikes per unit time. A variation in the number of emitted spikes in response to the 
same stimulus across trials, is then considered noise. In temporal coding, on the other 
hand, the stim ulus is encoded in moments of the spike pattern that have higher order 



than the mean (Theu nissen & M iller. 



1995) 



While neural codes are characterized in terms of these encoding views (i.e., how 
the neurons map the stimulus onto the features of spike responses), these are often in- 
vestigated and validated using decoding. From the decoding viewpoint, rate coding is 
operationally defined by counting the number of spikes over a period of time, with- 
out taking into account any correlation structure among spikes. Any scheme based on 
such an operation is equivalent to decoding under the stationary Poisson assumption, 
because the number of spikes over a period of time, or the sample mean of interspike 
intervals (ISIs), is a sufficient statistic for the rate parameter of a homogeneous Poisson 
process. In this manuscript, a decoder based on counting the number of spikes, or on 
taking the sample mean of ISIs, is labeled as "rate decoder". Similarly, temporal cod- 
ing can be defined by decoding the stimulus using a statistical model with a correlation 
structure between spikes (such as the MI model, introduced below). If such a decoder 
improves on the performance of the rate decoder, it indicates that signifi cant informa 



tion about the stimulus i s carried in the temporal aspect of spike trains ([Jacobs et al. , 



2009; 



Pill ow et al. . 



2005). 



A simple statistical model with a correlation structure has been introduced in the 



literature, taking the intensity function of a point process to be a product of two factors: 

\(t,s*(t)) = <i>(t)g(t-s,(t)), (1) 

where s*(t) represents the last spike time preceding t. This statistical model with the 
intens ity function © has been called the multiplicative intensity (MI) model by 



Aalen 



Kass & Ventura 



1978) and the multiplicative inhomogeneous Markov interval model by 
(|200l|) . <f>(t) is the free firing rate, which depends only on the stimulus, and g(t — s*(£)) 
is the recovery function, which describes the dependency of the last spike time preced- 
ing t and hence allows the MI model to have a correlation structure between spikes. 
Note that Eq.® becomes the intensity function of an inhomogeneous Poisson process 
if the recovery function is constant in time. It has bee n reported t hat t he MI model 



2009J), which en- 



enhances decoding performance in real data analysis (jJacobs et all 
courages use of the MI model to test temporal codes. 

Although neural codes can be defined in terms of either encoding or decoding, the 
resulting codes generally differ from one another. Here, we investigate the relation 
between the two viewpoints of neural coding in terms of rate and temporal coding 
schemes. Specifically, we consider, for the sake of analytical tractability, a simple set- 
ting of a single stationary stimulus parameter and a single stationary spike train rep- 
resented by a renewal process, and investigate the extent to which decoders of each 
scheme decode neural codes that are defined in terms of encoding. Our main claim is 
that when rate codes are defined in terms of encoding, i.e., the stimulus parameter is 
mapped onto the mean firing rate, the rate decoder does not always efficiently decode 
the rate codes, whereas the temporal decoder can improve efficiency in reading certain 
rate codes. 

In order to deduce our results, we develop, in section |2] a statistical theory based on 
asymptotic estimation, i.e., inference from a large number of ISIs. However, care must 
be taken when results based on asymptotic analysis are translated into non- asymptotic 
cases, which are certainly relevant in more realistic coding contexts. This will be ad- 
dressed in section |3l 



2 Theory 

2.1 Definition of encoding and decoding 

We suppose, for simplicity, that neural spikes are described by a stationary renewal 
process. The response of single neurons is then described by an ISI density, p(x\9), 
where x E [0, oo), and 9 E C (— oo, oo) is a one-dimensional stimulus parameter. 
The renewal assumption is n ot exactly true for actu al neural data, but often provides 



a reasonable approximation (|Troy & RobsonL Il992|) . Let fj, = E(x\9) be the mean 



parameter, E{-\9) being the expectation with respect to p(x\9). 



Consider first the rate encoding scheme. Since the early work of lAdrian & Zotterman 



(|1926|) . there has been a search for a functional relationship between stimulus parame- 
ters and the average firing rate, which is often described as a function of the stimulus 
parameters. This motivates us to formulate rate encoding as a one-to-one mapping from 
9 to fJi{9). The variation in x around the mean fi is then regarded as noise. In short, the 
rate encoding scheme can formally be defined as follows: 

Definition 1 If there exists a one-to-one and differentiable mapping 9 H Y /u(6>), the 
scheme is rate encoding. 

The assumption of differentiability in y u(6 ) ) with respect to 9 is required for analytical 
purposes, but is also reasonable physiologically because it shows that a small change in 
9 results in a small, smooth change in /i(6>). 

Temporal encoding, on the other hand, intuitively means that the stimulus is encoded 
in statistical structures of ISIs beyond the firing rate. Since it allows for many alterna- 
tives, we do not explicitly define temporal encoding here, but instead give an example 
below. Let p(x\n, k) be a dispersion model, where /i is the mean and k is the dispersion 
parameter that characterizes moments of the ISIs of higher order than the mean. If the 



stimulus parameter is mapped onto the disp ersion parameter, 9 <-)■ k 



9), thi s scheme 



201 ih . 



can be categorized under temporal encoding (|Kostal. Lansky & Pokoral . 

For decoding, we assume an ISI density, q(x\(f)), 4> E $ C (—00,00), which is 
chosen according to the decoding schemes introduced below. We suppose that decod- 
ing is performed by the maximum likelihood estimation (MLE) with q{x\<p). In rate 
decoding, one usually counts the number of spikes over a period of time, without taking 

4 



into account any dependency among spikes. This is equivalent to decoding under the 
Poisson assumption, because the number of spikes is a sufficient statistic for the rate pa- 
rameter of a homogeneous Poisson process. Thus, q(x\<f>) is taken to be the exponential 
distribution, q(x\<j>) = 0exp(— <j>x), for rate decoding. 

In temporal decoding, on the other hand, where a temporal dependency of spike 
timing relative to the last spike is considered, we take q{x\<j>) to be the MI model. Here, 
the ISI distribution of the MI model is constructed as follows. Since we only take into 
account stationary renewal processes, the rate factor in Eq.© is reduced to a constant, 
and then the intensity function, A (a;), of the MI model becomes 



\{x) = (f>g(x), 



,0 



where <p e [0, oo) is the free firing rate and g(x)(> 0) is the recovery function LU. The 
ISI distribution of the MI model is then obtained as 

q(x\<j>) = <f>g(x)exp[-<f>G(x)], (2) 

where 

PX 

G(x) = / g{u)du. 
Jo 

In order for the MI model to be well behaved as a decoder, we assume that the variance 



1995) that 



of G(x) is finite. It is obvious from the factorization theorem (jSchervisbl 

G(x) is a sufficient statistic for <\>. Note that Eq.© becomes an exponential distribution 

with firing rate if g(x) = 1, x > 0. The two decoding schemes are summarized as 

follows: 

Definition 2 In rate decoding, 9 is decoded with q(x\<f>) being the exponential distribu- 
tion via the MLE. In temporal decoding, 9 is decoded with q(x\4>) being the MI model 
via the MLE. 

We use the MI model in temporal decoding for the following reasons. First, the 
inhomogeneous version of the MI model given by Eq.(0Q) is useful in practice, as it 



can be easily fitted to data by well-established statistical methods (jKass & Ventura . 



Since the units of A (a;) are those of firing rate (i.e., spikes per unit t i me), b y convention, we let 



also have units of firing rate, leaving g(x) dimensionless dKass & Ventura. 1200 II) 



200ll : lDiMatteo et all 1200 1|) . In fact. I Jacobs et all (120091) demonstrated the importance 



of temporal coding by 



(IMcCullagh&Neldei . 



using 



1989; 



this model. Second, generalized linear models (G 



Paninski, 2004; 



Paninski et al. 



2007; 



Truccolo et al. , 



Ms) 



2005), 



which have been used extensively for statistical analysis of neural data, include the MI 
model as a special case. Specifically, the GLM corresponds to the MI model when the 



spiking history term contains only the la st spike and a 
soft-threshold integrate-and-fire models (IPaninski et al. , 



og-li nk function is used (e.g., 



2008)) 



In order to investigate the extent to which decoders of each scheme decode neural 
codes that are defined in terms of encoding, in section 12.21 we introduce a correlation 
quantity p 2 e given by Eq.©, which measures decoding performance with q(x\4>). 



2.2 Correlation quantity 

We shall assume that p(x\9) and q(x\(j)) satisfy t he traditional regularity assumptions 
needed for standard asymptotics (|Schervishl Il995|) . We first define a correlation quan- 
tity that measures a "similarity" between two models. Let 



s p (x,9) 



and 



Sq{X, 



d log p(x\9) 



d\ogq{x\ 



be the score functions of p{x\9) and q(x\<f>), respectively. For a given 9, the parameter 
of the decoder model, <fi, is taken to be a function <p{9) of 9 satisfying 



E[a q {x,<j>(d))\d\=Q. 

We define the square correlation coefficient p 2 e as 

2 _ Cov[s p (x,9),s q (x,<P(9W? E[s p (x,9)s q (x,<P(9W? 

Pe ~ Vax[s p (x,9)\6]Vai[s q (x,<j>(9))\9] J e E[s q (x,<p(9)) 2 \6] ' 



(3) 



(4) 



where Var[- \9] and Cov[- \9] represent, respectively, the variance and the covariance with 
respect to p(x\6), and Jg is the Fisher information defined by 

Je = E[s p (x,9) 2 \9]. 

Note that we used E[s p (x, 9)\9] = in deriving the right-hand side of Eq.©. The 
square correlation coefficient p 2 is related to the c oefficient of determinant, R 2 , used in 



1998) 



a simple regression analysis (Ra wlings et al.L 

pi has the following geometrical property. In a linear space of square integral func- 
tions, the inner product and norm are defined to be 



s p , s q ) e = E(s p s q \9) 



1/2 



vl/2 



\s\\ e =(s,s)^ = E(s 2 \9) 



The square correlation coefficient is then rewritten as 

„ 2 _/ s p (x,9) s q (x, (f)(9)) 2 _ , 

-)g — COS if, 



(5) 



"" "\\s p (x,9)\y\\s q (x,(f)(9))\[ 

where <p is the angle between s p (x, 9) and s q (x ) 4>{9)) with respect to (, ) e . Thus, p 2 B = 1 
if s q (x, (f)(9)) is parallel to s p (x,9), while pj = Oif s q (x, (f)(9)) is orthogonal to s p (x,9). 
In the following, we will give two interpretations of p 2 ,, in terms of statistical infer- 
ence (LemmaO and information theory (LemmalD, which will provide useful insights 
for translating the meaning of p 2 e into the context of neural decoding. 



2.1 Asymptotic efficiency 

Let X\,X2, ■ ■ ■ , x n be independent and identically distributed random variables from 

p(x\9), and <f) n = 4> n (xi, x 2 , ■ ■ ■ , x n ) be the MLE of q(x\4>) based on Xi,y ?, . . . , x n . 



1982). For the 



Then, n — > (f)(9) as n — > oo, where (p(9) satisfies Eq.® (|WhiteL 
inference of 9 from <f) n , we assume that d<f)(9)/d9 ^ 0. An estimator of 9 would, thus, 
be transformed from <p n as 9 n = _1 (0„). We also assume that 9 n is an unbiased 
estimator of 9. The performance of the unbiased estimator is e valuated by its va riance, 



and the ratio of it to its lower bound is called the efficiency (|Schervish , 



1995). The 



following lemma holds under the above conditions. 



Lemma 3 pi gives the asymptotic efficiency ofO r , 



Proo f: Under su itable regularity conditions, it is proven that <\> n is asymptotically nor- 



mal dWhM 



1982): 



\/n((f) n — (f)(0)) — >■ N(0, v) in distribution, 



where 



v = E[s q (x,m) 2 \0}E 



By the delta method (|Schervish . 



ds q {x,<f>{0)) 



d<p 



1995|), we obtain 



(6) 



y/n(9 r , 



—)■ N(0, v/c 2 ) in distribution, 



where 



d(f>(6) 

de 



-E[s p {x,9)s q (x,<l>{0))\0\E 



ds q {x,<f>{9)) 



(7) 



is derived by differentiating Eq.© with respect to 9. Since the lower bound of the 
asymptotic variance is given by the inverse of the Fisher information (i.e., the Cramer- 
Rao lower bound), the asymptotic efficiency is defined by the ratio c 2 J B l /v. Using 
Eqs.© and ©, we obtain c 2 J e 1 /v = p% □ 

2.2 Information-theoretic quantity 

We next connect p 2 e to an information-theoretic measure. Consider a situation in which 
a neuron is subjected to a stimulus chosen from a probability distribution, p(9). In 
information theory, the amount of information about the stimulus transferred through a 



noisy channel is quantified by the mutual information ((Cover & Tomas 



1991) 



p(«) logpWds + j jp{ff) p{x \ff) v*M*)#d*. (« 



The amount of information that can be gained by decoding depends on the probability 
distribution used in a decoder. In order to introduce this information, we revisit an 
information-theoretic interpretation of the mutual information. Suppose that the neuron 



8 



is subjected to a set of stimuli, and consider how many stimuli can be encoded in its 
response. If each stimulus is encoded in a sequence of n(^> 1) ISIs, the upper bound 
on the number of stimuli that can be encoded almost error- free is e nI . In decoding, if 
the true model, p(x\9), is used to build a decoder, the upper bound of the number of 
stimuli that can be decoded almost freely from errors is the same, e nI . If, on the other 



hand, the inaccurate model, q(x\9 
e nI * , where I* < I was derived in 



, is used, then the u pper bound is typically smaller, 



Merhav et al. 



(1994) as 



/• = run - -f^^fm^mrm^Jj^^^Amr^, 

(9) 
with (3* being the value that maximizes I* ((3). Thus, the normalized qua ntity, I* /I, is 



regard e d as an information g ain obtained by using q(x\<p) in decoding. See Latham & Nirenberg 



(2005); 



Oizumi et al 



(|2010f) for more details and use of I* in the context of neural de- 
coding. The following lemma connects I* /I with p 2 ,. 

Lemma 4 Suppose that the mean and variance ofp(8') are given by 9 and e 2 , respec- 
tively. For e< 1, the information ratio is given by 



I* 

T 



O(e). 



(10) 



Proof: For an integrable function, f(x), that is twice differentiable, it follows that 
J f(x)p(9')d9' = f{9) + ^e 2 + 0(e 3 ). 



By using this, we obtain 



iJ 2 



I*(/3) = l3E[s p {x,9)s q {x,4>{9))W I - '—E[s q {x,<t>{9)YW 2 + 0(e"). (11) 



The optimal /3* is obtained by maximizing Eq.dTTb with respect to /3 as 

= E[s p (x,O)a q (x,<j>(0)M , g(f] 
P E[s q {x^{9)?\0] U ' 



(12) 



Substituting Eq.([T2]) into Eq.£[I]) leads to 

W > 2E[s g (x,m) 2 \D] l ' 

In the same manner, the mutual information ([8]) is evaluated as 

/ = ij e e 2 + 0(e 3 ). (14) 

From Eqs.(H3]) and (|T4l . we obtain Eq.flO]). D 

2.3 Properties of p 2 e 

Lemma 5 pi has the following properties: 
i) < p 2 e < 1. 
ii) p\ achieves unity when the MLE ofq(x\(p) is a complete sufficient statistic for 9ci 

Proof: i) is obvious from Eq.©. To prove ii), let <p = 4>( x ) denote the MLE of q(x\(f>). 
Let /i(0) and f2{4>) be unbiased estimators of 6. Then, we have E[fi((p) — f 2 ((f))\9] = 
for all 9 e 0. Since <p is a complete statistic, it follows that fi((f>) = f2{4>), a.s. [pe\ 
for all 9. Thus, all unbiased estimators of 9, which are functions of 0, are equal, a.s. 
[pe]. Now, suppose that there is an unbiased estimator f(x) of 9 with finite variance, 
and define 

e = E[f(x)\e,4>], (15) 

which forms an estimator of 9, since <p is sufficient for 9 and thus the conditional ex- 
pectation given <p does not depend on 9. 9 is unbiased because 

E(9\9) = E[E[f(x)\9j]\9] = E[f(x)\9] = 9. 

Thus, 9 defined by Eq.([T5l) is equal with the one defined in Lemma [3l a.s. \p$\. It 



9 
A statistic 4> is complete if for every measurable, real-valued function /, E[f {4>)\ 9] = for all 9 € 8 

implies f(<fi) = almost surely with respect to p(x\9) (denoted by 'a.s. [peY) for all 9. An interpretation 

of completeness for a sufficient statistic is that it makes the ancillary part of the data independent of 

JLehmann[[l98ll) . 



10 



follows that 



Vai(9\9) 



E[{e-ef\e] 




E[(E[f(x)\9,j>]- 


-ey\e\ 


E[E[f(x)-9\9, 


ti 2 \o] 


E[E[{f{x)-9f 


\0,4>]\0] 


E[{f{x)-9f\9] 




Var[/(x)|0], 





where the inequality follows from Jensen's inequality. Particularly, if we take f(x) to 
be an asymptotically efficient estimator (e.g., the MLE of p(x\9)), Var[/(x) \9] achieves 
the lower bound, J e x , which completes the proof of ii) because p 2 e gives the asymptotic 
efficiency of 9. □ 

From the interpretations and properties given in Lemmas |3] |] and [51 p 2 e can be 
used as a measure of decoding performance of q(x\(f>) when the true model is given by 
p(x\9). We say that q(x\<p) efficiently decodes 9 if p 2 e = 1. If pi > 0, 9 is asymptotically 
decodable with q(x\4>), whereas if p 2 e = 0, 9 is not decodable with q(x\4>). 

2.3 Results 

By using p 2 e defined in Eq.©, we now investigate the extent to which the decoders of 
each scheme decode rate and temporal codes. 

Theorem 6 In rate encoding, if the sample mean is a complete sufficient statistic for 
p, the rate decoder efficiently decodes 9 (i.e., p 2 e = 1 with q{x\4>) being the exponential 
distribution). 

Proof: Since p{9) is a one-to-one mapping, the sample mean is sufficient for 9. On 
the other hand, the MLE of the rate parameter of the exponential distribution is given 
by the sample mean. Therefore, the theorem follows from Lemma[5]ii). □ 

Theorem 7 Let q(x\<f)) be the MI model given by (E]). Either in rate encoding or in 
temporal encoding, 



11 



i) 6 is efficiently decoded (i.e., pi = 1) ifG(x) is a complete sufficient statistic for 

0. 

ii) 9 is asymptotically decodable (i.e., p 2 e > 0) if — 9 g 7^ 0. 

Proof: From Eq.©, the MLE of q(x\(f)) is given by <p = G(x)^ 1 . Thus, i) follows 
from Lemma |5]ii). For the proof of ii), we rewrite @ as follows. 

E[ Sp (x,9)s q (x,4>)\e] = f dl0g ^ 6) s q (x^)p{x\6)dx 

d f 
= qq Sq(x,(f))p(x\9)dx 

= ^E[ Sq {xA)\0] 
= ~E[G(x)\0\, 

where we used Eq.© to obtain the last equation. Inserting </> = (f)(0) into the above 
equation leads to 

E[s p (x,9)s q (x,(f>(e))\0] = -^E[G{x)\9]. (16) 

Through direct calculation, we also obtain 

E[s q {x, <f){6)) 2 \6] = E{(G{x) - E[G(x)\9]) 2 \e} = Vax[G(x)\6]. (17) 

Substituting Eqs.([T6l) and (fTTI) into Eq.©, p 2 e is written as 

d -MG{x)\e\ 2 



■2 



(18) 



J e V&v[G(x) 

Therefore, p 2 > holds if ^MeM ^ . □ 

The results and their consequences are summarized as follows. 

1) In rate encoding, if the sample mean is a complete sufficient statistic for p, the 
rate decoder efficiently decodes the rate code. 

2) If, on the other hand, the sample mean is not sufficient for p in rate encoding, but 
G(x) is chosen so that the value of pi for the temporal decoder is larger than that 

12 



for the rate decoder, the temporal decoder can decode the rate code with greater 
efficiency than the rate decoder. 

3) In temporal encoding, if G(x) is chosen so that — -gjp^ ^ 0, the temporal code 
is asymptotically decodable with the temporal decoder. Particularly, if G{x) can 
be taken to be a complete sufficient statistic for 9, the temporal decoder decodes 
the temporal code efficiently. 

In the following, we will give three examples that illustrate the above consequences. 
We first give an example illustrating consequence 2), where the rate decoder is not 
efficient for decoding a rate code, and the temporal decoder achieves greater efficiency 
than the rate decoder. 



Example 8 Let p{x\ii, k) be a log-normal distribution: 

x I re\2 - 

(19) 



p(x\n,K) = — ^=exp 
xy 2ttk 



(log^ + f) 2 



A' 



2 k 



See iLevind (| 1 99 II) for modeling the stochastic nature of ISIs with the log-normal distri- 
bution. Suppose that the stimulus is encoded in fi, i.e., rate encoding. The sample mean 
is not a sufficient statistic for /i of the distribution, which implies that the rate decoder 
does not decode efficiently. Indeed, p 2 g for the rate decoder is derived in Appendix IA.1I 

as 

K 

(20) 



2 K 



e K - 1 

p 2 g — > if k — > oo, as the distribution becomes more skewed and has a longer right-hand 
tail. 

Instead of the rate decoder, consider using the temporal decoder with the MI model's 
recovery function being 

9{x) = {aX 'J ] , , , (21) 

1 (a, ax/T) 

where T(a, z) is the incomplete gamma function: 

In Eq.(r2TI). a(> 0) determines the shape of g(x) (i.e., g(x) ~ x a_1 near x = 0), and r 
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Figure 1: (a) The shape of the recovery function (1211 for a = 0.8, 1, 3 and 10. (b) 
Pg of temporal decoding as a function of t//jl in Example \\0\ The value of the shape 
parameter of the gamma distribution was taken to be k = 5. pi reaches its maximum 
when p w r. 

represents the correlation timescale between successive spikes. Figure QI a) depicts the 
shape of g(x) for several values of a. It is shown in Appendix I A . 2 1 that for each k > 0, 
the temporal decoder with the recovery function (ETT i achieves pi « 1 as closely as 
possible by taking r to be large enough and a to be small enough, because the sufficient 
statistic G(x) for the parameter <p of the MI model approximates to log a;, which is a 
sufficient statistic for the mean parameter of the log-normal distribution. □ 

The next example illustrates consequence 1), i.e., a situation in which the sample 
mean is sufficient for the mean parameter. 

Example 9 Suppose that p(x\pi, k) is a gamma distribution with the mean p, and the 
shape parameter k: 

P(*lA«,«)= ^ r(K) , (22) 

where T(k) is the gamma function. The gamma distribution has been used to describe 
the stoch astic nature o f ISIs , and its information-theoretic properties have been stud- 



ied (Ikeda & Manton 



(2009) and references therein). Also, suppose that the stimu- 
lus is mapped onto p, (i.e., rate encoding). It is easy to see that the sample mean is 
a complete sufficient statistic for p,, and thus the rate decoder efficiently decodes the 
stimulus (pg = 1), regardless of the value of k. Note that the variance of the sample 
mean achieves the Cramer-Rao lower bound even with a finite sample size, because the 
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gamma distribution is an exponential family distribution (|SchervishL 11995)) . Thus, nei- 
ther the temporal decoder nor the gamma distribution (i.e., the true model) is necessary 
for efficient decoding even with a finite sample size. □ 

The last example illustrates consequence 3). 

Example 10 Consider that the true ISI distribution is given to be the gamma distribu- 
tion (|22|) . and that the stimulus is encoded in k (i.e., temporal encoding). For temporal 
decoding, let us take the recovery function of the MI model to be Eq. (|2T|) . From a direct 
calculation (Appendix I A.3I) . p 2 e is expressed as 

{lE[\ogT(a,a^)\K}} 2 
fe ' J K Var[logi>, afa;)|fi:] ' V ; 

where S[-|k] and Var[-|«;] are taken with respect to p(x\p = 1, k). Note that pi is a 
function of the dimensionless parameter, p/r. pi was numerically computed for each 
value of the parameters, («, p/r). The value of a was taken so as to maximize pi for 
each value of parameters. Figure [TJb) depicts pi as a function of r/p. It is seen from 
this figure that pi takes its maximum near rj p ps 1, which indicates that the MI model 
decodes best when the mean ISI of the true model, p, matches the correlation timescale 
of the MI model, r. □ 

3 Discussion 

Our main results are summarized as follows. First, the rate decoder efficiently decodes 
rate codes if and only if the sample mean is a sufficient statistic for the mean parameter 
of the true model. Second, the temporal decoder improves on the performance of the 
rate decoder by a) decoding temporal codes that the rate decoder fails to read, and b) 
achieving greater efficiency in decoding certain rate codes . 

These results suggest that rate codes in stationary spike trains, which are defined 
as the mapping from the stimulus to the mean firing rate, can further be divided into 
two subcategories when the concept of sufficiency is taken into consideration: one is a 
"strong" rate code, in which the sample mean is a sufficient statistic for decoding, and 
the other is a "weak" rate code, in which the sample mean is not sufficient. We should 
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notice that spike count decoding matches the strong form of rate encoding, but not weak 
form. 

How can decoding results inform us whether or not rate coding is being used? In or- 
der to answer this question in the context of neuronal data analysis, one may decode the 
stimulus with rate and temporal decoders, and compare their decoding performances 



(Jacobs et al 



2009|) . This procedure tells us whether or not the sample mean is suffi- 



cient for decoding the stimulus: if the rate decoder performs as well as the temporal 
decoder, then the sample mean is sufficient; if it does not, then the sample mean is not 
sufficient. In terms of the original question of whether rate coding is being used, only 
in the former case can we translate the decoding result into "strong" rate encoding; in 
the latter case, we cannot conclude which scheme, "weak" rate encoding or temporal 
encoding, is being used. 

The key quantity in our theoretical analysis is the square correlation coefficient, 
pg, which quantifies neural decoding performance. It is worth pointing out that the 
unnormalized quantity of p 2 e \ 



Jo = PeJe 



E[s p (x,9)s q (x, ( p(9)Wf 
E[s q (xA(6)y\e] 



can be regarded as a generalization of the Fisher information, Jg, in the sense that Jg 
becomes Jg if q{x\4>) = p(x\9). Jg has similar properties to J e ; (i) Jg^ 1 gives the 
asymptotic varian ce of the MLE of q{x\4>) (Lemma HJ) as Jg^ 1 gives that of p(x\9) 



(Schervish, 



19951) . and (ii) Jg appears in the leading term of the information, I*, of the 



decoder with q(x\(f> ) (Lemma Sb, as Je does in the mutual information with the limit of 



small input power (Kostal, 



(for review, see 



2010 ). As Jg has been used to measure encoding accuracy 



Day an & AbbottLfeOOlL chap. 3), Jg is used to measure the performance 



of neural decoders. 

It must be noted that our analysis is based on asymptotic theory, which assumes a 
large sample size. The inverse of the Fisher information and its generalization, J|, give 
the lower bounds of the variance of unbiased estimators, but generally do not corre- 
spond to the mean squared error of the estimators with a finite sample size, except for 
special cases of exponential family distributions. Thus, th e results based on asymptotic 



analysis may not be justified for non- asymptotic cases. (|Bethge et al. 



(2002) examined 
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this point in the context of population coding.) Especially, decoding using the "wrong" 
model may severely compromise the accuracy of decoding in non-asymptotic cases. 
One therefore has to check carefully whether analysis using p 2 e provides correct results 
in terms of minimum mean squared error when the asymptotic results are translated into 
non-asymptotic cases. 

Our simple setting of stationary and renewal assumptions does not account for two 
aspects of neuronal spikes that are relevant for neural coding. First, actual spike trains 
exhibit nonstationarity due to both, the dynamics of the stimulus and the nature of the 
neural encoding processes such as adaptation. Rate encoding for this case is generalized 
to the scheme in which the stimulus is mapped onto a time-dependent firing rate, or, the 
marginal intensity function. Then the question we would like to address is whether 
reasonable estimates of the firing rate (e.g., based on spline models or histograms), are 
asymptotically sufficient for decoding the stimulus, which may require more mathe- 
matically careful treatment to be proven. Second, higher-order serial dependencies in 
sequences of ISIs, for which the MI model (OQ) can not account, would certainly be rele- 
vant for neural coding. Accordingly, temporal encoding is generalized to the scheme in 
which the stimulus is mapped onto the higher-order serial dependencies. For temporal 
decoding, the MI model can be generalized by taking the recovery function to depend 
on the whole spiking history, rather than simply on the last spike. Taking into consid- 
eration these two extensions, we suspect that our results summarized at the beginning 
of the Discussion still hold. It would be interesting to examine the relation between en- 
coding and decoding in a more realistic setting, for instance, with biophysically realistic 
neuron models. 

A Appendix: details of derivations 
A.l Derivation of equation (1201) 

Taking the parameter \x = ji{6) and inserting G(x) = x into (fT8l) . p 2 e for the rate 

becomes 

/_. 

<9/i 



d -E{x\&) 



J M Var(2|fl) ' 
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For the log-normal distribution (fT9l ), we have E(x\9) = p, Var(x|6 ) ) = p 2 (e K — 1), and 



J„ 



-E 



dp- 



logp(x\p, k) 



K,p z 



Using these, we obtain Eq. (|20 



A.2 Temporal decoding for the log-normal distribution 

Here, we show that the temporal decoder with recovery function (1211) can achieve p 2 , ~ 
1 as closely as possible in Example [8j Taking the parameter p = p(6) in (fT8~l) , we have 



{jE[G{x)\6] 
J„Var[G(x)|0] ' 



where 



G{x) 



logT(a) -logT^a,— X i 



loelYa) 



Q°-- X X a 



T(a)T a 



T ) 

-a-1 



+ 0(r- a - 1 ), 



for r 3> 1 . Then, 



9£[G(a;)|0] a " 1 <9£(x Q |0) .._„_:, 



9/i 



r(a)r a 9/i 



+ 0(r- Q - i ). 



A similar calculation leads to 



Var[G(x)|0] 



a 



*-l \ 2 



T(a)T c 



- Var(x Q |0) + 0(r 



-2a-l> 



For the log-normal distribution (TT9T ), we also have J M 
/i m eK (m-i)m/2 j m > . Thus, we obtain 



l/(np 2 ) and E(x m \6) 



na 



e Ka - 1 



+ 0(r- 1 ) 



Therefore, lim T _ 



>oo,a— >0 i 



1, that is, we can achieve p 2 e rj 1 as closely as possible by 



taking r to be large enough and a to be small enough. 
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A.3 Derivation of equation (1231) 

Eq. (|2D) is rewritten as 

r dr(a,ax/T) q 

g(x) = -S- — ^-— = — log T (a, acc/r). 

1 (a, ax/T) a ox 

Then, we get 

G{x) = / g{u)du = — [logr(a) — logT(a, ax/r)}, 
Jo a 

where we used T(a, 0) = T(a). Taking k = k{6) in Eq.(fT8T), we obtain 

2 _ {-k E lG(x)\6}} 2 _ {d-E[logT(a,ax/r)\e]} 2 
Pe ~ J K V&r[G(x)\6] ~ J K Var[logr(a,ax/r)|#] ' 

Thus, the scaling property of the gamma distribution leads to Eq.(l23l). 
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